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SUMMARY 


Two difficulties affecting the implementation of the alternating direction 
implicit (ADI) method on the Control Data STAR-100 computer are discussed. The 
first is that much of the computation is in the solution of tridiagonal systems 
of equations, and investigations have shown that parallel or vector tridiagonal 
solvers are not particularly efficient on the STAR computer until the size of 
the system is quite large- Second, since the direction of implicitness alter- 
nates at each half step in the algorithm, considerable data rearrangement is 
required to efficiently use a vector algorithm in both directions. 

Two parallel algorithms are described which solve M independent sys- 
tems of tridiagonal equations. Each algorithm solves the M systems more effi- 
ciently than a parallel solver used M times on one system. The two algorithms 
are compared by using STAR-100 timing. It is then shown that the storage require 
ments for the two algorithms are such that they can be used together in an ADI 
implementation which has no requirement for data rearrangement. An average vec- 
tor length is derived for the ADI method which shows that in some instances the 
method may be very efficient for the STAR-100 computer- 

several other algorithms which result in the need to solve many systems 
of tridiagonal equations are discussed. 


INTRODUCTION 

The alternating direction implicit (ADI) method for solving elliptic 
partial differential equations has proved to be a very effective method for a 
restricted class of problems. It is also widely used in solving parabolic par- 
tial differential equations because it has excellent stability characteristics. 
Part of the popularity of this method is due to the fact that the main computa- 
tional work per iteration involves the repeated solution of sets of tridiagonal 
equations. Whereas this latter characteristic has been one of its strengths on 
the conventional serial computer, studies have indicated that this may be a 
weakness on a vector computer. (See ref. 1.) Another difficulty arises if 
there is a requirement that vector elements be stored in contiguous locations, 
as on the Control Data STAR-100 computer. There, an algorithm deemed efficient 
for solving the tridiagonal systems of equations in one direction cannot be 
used when the direction of implicitness is alternated unless considerable data 
movement is performed- This movement, corresponding to the transpose of the 
matrix of grid values, represents an overhead to the algorithm which can be 
quite costly. 

Two approaches to solving the tridiagonal sets of equations are presented. 
The first is the vectorization of an approach suggested for the ILLIAC IV in 
reference 2 where each parallel processor is used simultaneously to solve one 
of the tridiagonal systems- This is referred to as (SS) for simultaneous solu- 



tion. The second method is motivated from the study of the odd-even reduction 
parallel algorithm in reference 1 which showed that for a large single system, 
this parallel approach was considerably better on the STAR-100 vector computer 
than the other methods considered. This report shows that M independent sys- 
tems of tridiagonal equations, each of size N, can be linked together to be 
solved as one syston of size MN with few additional computations required. 

In addition, it is proved that a system artificially coupled in this manner can 
be solved by the odd-even algorithm in log 2 N steps instead of the expected 
log 2 MN steps. This algorithm is referred to as the (LP) algorithm for linked 
parallel. It is then possible to show that the storage requirements for the 
two algorithms ((SS) and (LP)) are such that they can be used together to avoid 
the necessity of rearranging the data between half steps. It should be noted 
that the ILLIAC IV has a storage scheme which allows the access of both rows and 
columns of a matrix so that one can conveniently use (SS) to solve the equations 
in both directions. Similarly, Texas Instruments Advanced Scientific Computer 
(ASC) can access noncontiguous locations with a fixed separation but at reduced 
rates. The average vector length for the ADI method is derived and arguments 
are made that it can be an efficient algorithm for the STAR ccmputer in some 
circumstances. Other numerical methods which require the solution of tridiag- 
onal systems (e.g., successive line over relaxation (SLOR) or Crank-Nicolson) 
are also discussed. 


SYMBOLS AND ABBREVIATIONS 
A matrix of coefficients 

ADI alternating direction implicit method 

Ai,Bi,Ci submatrices in (SS) and SLOR algorithms 
a,b,c diagonals of tridiagonal matrix 

ai,b,c diagonals of reduced tridiagonal matrix in odd-even reduction algorithm 
Bi,Ci toBi and , respectively 

d diagonal of U factor in equations (1) to (3) 

g solution to forward substitution in equation (2) 

i,j,k indexing integers 

(LP) linked parallel algorithm 

L,U factors of matrix A 

L2fQl»Q2 submatrices of Q in equation (11) 

S, subdiagonal of L factor in equations (1) to (3) 

£ average STAR- 100 vector length in ADI method 
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M,N 


grid dimensions 


Nx,Ny number of columns and rows, respectively, in grid 
N N/2 

N' largest power of 2 equal to or greater than N 

p MN 

P P/2 

Q,R implicit and explicit matrix representations, respectively, in itera- 

tive procedure 

ratio of STAR-100 times for ADI algorithms as defined in equation (15) 
t,r right-hand side vector in matrix equations 

r ith subvector from r 

SLOR successive line overrelaxation 

(SS) simultaneous solution algorithm 

s,t vectors defined in equations (5) and (6) 

T block diagonal matrix in which each diagonal submatrix is tridiagonal 

Tge star- 100 time to solve tridiagonal system by using serial Gauss elimi- 

nation (minor cycles) 

Tj ith tridiagonal submatrix of T 

Tlr(M,N) star- 100 time to solve M independent tridiagonal systems of size N 
by using (LP) algorithm (minor cycles) 

Tqe star- 100 time to solve a tridiagonal system by using odd-even reduc- 

tion (minor cycles) 

Tgg(M,N) STAR-lOO time to solve M independent tridiagonal systems of size N 
by using (SS) algorithm (minor cycles) 

Trp(M,N) STAR-lOO time to transpose an M x n matrix (minor cycles) 

u solution vector 

u(^) ith subvector of u 

v,v^^) ,w,w^^J vectors used in equation (17) 

x,y,z coordinate directions 
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T vectors of coefficients for finite-difference equations 
6 maximum of Nx and Ny 

M minimum of Ny and Ny 

0 number of vector operations of length N^ in ADI implementation 

cp number of vector operations of length N in ADI implonentation 

w relaxation parameter in SLGR 


VECTCR TRI DIAGONAL SOLVERS 

The usual approach to solve a diagonally dominant system Au = r, which is 
given by 


ai bi 
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is to apply Gauss elimination (or some equivalent form such as the Thomas 
algorithm). The algorithm to factor A as A = LU where 
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is given by 


di = ai 


(i = 2, . . N) (1) 


(i = 2, . . N) (2) 


(i = N-1, . . 1) (3) 

This is a very efficient algorithm requiring only 8N arithmetic operations. 
However, when considered for vector (or parallel) computers, the algorithm is 
unsuitable since the ccmputations are of a recursive nature. The algorithm 
given by equations (1) to (3) is said to be nonvectori zable since it must be 
carried out primarily with scalar code on a vector canputer (assuming that this 
is preferable to vectors of length one). Several parallel algorithms that use 
direct methods have been proposed and investigated in references 1, 3, 4, and 5. 
Iterative methods have been proposed in references 6 and 7. 

A comparison of the predicted performance of most of these methods on the 
STAR-100 computer is contained in reference 1. There it is shown that, for most 
situations, the one-dimensional version of cyclic (odd-even) reduction performs 
the best. Briefly, this algorithm requires log 2 N major steps to solve a sys- 
tem of size N. At each step, one-half of the unknowns are eliminated, leaving 
a new tridiagonal system half as large. A good estimate for the STAR-100 timing 
for this algorithm is 

Tqe “ 5250 log2 N + 46N (4) 

where Tqe is given in units of STAR minor cycles (40 nsec). In figure 1, 
the ratio of Tqe to Tqe (the time for a scalar implementation of Gauss elim- 
ination on the STAR-100 computer) is plotted as a function of the number of 
equations N. This algorithm is described in reference 1, and experiments indi- 
cate that Tqe “ 1275 + 255N. It is clear that even the most promising parallel 
solver is not relatively efficient until the size of the system is quite large. 
Since the size of a system using ADI is the number of grid points in one direc- 
tion of the grid, it is not likely that the parallel algorithm will be useful 
in a straightforward implementation of ADI. 


S-i - Ci/di_l ^ 
di = ai - 5<ibi_i j 
Then, Lg = r is solved by 
91 = ri 

gi = ri - S-igi-i 
Finally, Uu = g is solved by 

= 9n/% 

Ui = (91 - ui+ibi)/di 
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ADI METHOD 


The ADI method is discussed in detail in many references. (See, for exam- 
ple, ref. 8.) Although the method is applicable to three-dimensional problans, 
this paper will only address the two-dimensional problem except where specifi- 
cally noted. Briefly, when solving Au = r with the ADI method, the A matrix 
is expressed as the sum of two matrices; this leads to the two-step procedure to 
advance the solution from step 2k to step 2k + 2 

Hu2k+1 = s (5) 

and 

Vu2k+2 = t (6) 

The vectors s and t are functions of u^k and u^^+l, respectively. The 
matrices H and V are chosen so that they are invertible, and if the method 
is to be of practical value, easily invertible. When, for example, A arises 
fron the application of the five-point difference formula to a general second- 
order elliptic partial differential equation, H corresponds to the finite- 
difference approximation in the x-direction and V corresponds to derivatives 
in the y-direction. The matrices H and V may include terms added to the 
diagonal to increase stability and accelerate convergence. Under suitable per- 
mutations, both H and V are tridiagonal. In solving equation (5) , the direc- 
tion of implicitness is said to be in the x-direction; similarly, in solving 
equation (6), the direction of implicitness is said to be in the y-direction. 

For illustrative purposes, consider the implementation of the ADI method 
for a second-order elliptic partial differential equation with Dirichlet bound- 
ary conditions on the model mesh (fig. 2)!, Let N be the number of interior 
grid lines in either direction, which here is N = 3. Since the order in which 
the grid points and the coefficients at each grid point are stored in the com- 
puter is important, assume that the number of the grid point also indicates its 
relative position in the pertinent arrays. Therefore, all arrays are stored so 
that they correspond to rows of the grid. Assume further that the solution has 
advanced through 2k steps and that at step 2k + 1 , the direction of implicit- 
ness is in the x-direction. The equation for any point i is of the form 

[otiUi_i + BiUi + YiUi+i]2k+l = ri - [piUi_N + PiUi + TiUi+N]^'^ 

and when all the points for that row of the grid are grouped together, a tri- 
diagonal system of equations results. There is one such system for each row in 
the grid. The equations for, say, the second interior row are 


^12 y^2 0 

U12 

2k+l 

ri2 - “12^11 - P12U7 " “12U12 " ^12^17 

“13 ^13 Yi3 

U13 

= 

ri3 - P13U8 - ai3Ui3 - Ti3U18 

0 ai4 3 i4 

ui4_ 


/14 - Y14U15 - P 14 U 9 - “14^14 - T14U19 


Note that quantities in equation (7) are stored appropriately to allow vector 
operations of length N in the evaluation of the right side of the equation. 
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with the exception of the subtraction of the boundary value in the equation for 
the first and last point of each row. Also, each diagonal of the matrix has its 
elements stored consecutively, as required when using the cyclic reduction paral- 
lel algorithm discussed in the section entitled "Vector Tridiagonal Solvers." 

All of the independent tridiagonal systems (one for each row) to be solved 
at step 2k can be grouped into one matrix equation which has the form 

Qu2k+1 _ Ry2k + j 

where the matrices Q and R are as shown in figure 3 if it is assumed that 
any references to the known boundary values are included in f. It is instruc- 
tive to view the overall structure because the algorithms to be used in the ADI 
implementation will be based first upon the computation of Ru*^ + r (in which 
long vector operations can be used) and then on a solution to the matrix equa- 
tion which exploits the structure of Q (as opposed to solving each system one 
at a time ) . 

When the direction of implicitness is in the y-direction, the resulting 
equation for the ith point is 

[PiUi-N + OiUi + = ri - [aiUi_i + BiUi + YiUi+i]2k+l (8) 

When the equations for all points in a column of grid points are grouped 
together, a tridiagonal matrix arises. The matrix equation for the second 
column is ' ■ 

~0 8 T8 0 Ifus - “8“7 -^808 “781^9 

P13 <^13 '^13 >^13 = - “13^12 " ^13^13 * Y)3i^l4 (9) 

0 Pl8 <Pl8 U)8J [>^18 - '^18^23 " <^18^17 ~ ^18^18 ~ ^18^19 

Observe that the right side of equation (9) cannot be evaluated with vectors of 
length N, nor are the diagonals of the tridiagonal system stored consecutively. 
When all of the independent systems at step 2k + 1 are grouped, the matrix 
equation is of the form 

Qu2k+2 = Ru^k+I + £ 

where Q and R now have the structure shown in figure 4. In order to have 
the same structure as in figure 3, one would need, for instance, U8, followed 
by u-|3, followed by ui8» etc. This, of course, corresponds to the transpose 
of the original storage scheme. On a serial ccmputer, the different structure 
presents' no problem, at least not when all the data are contained in central 
memory. The arrays of coefficients and the previous solution vector can be 
stored in a two-dimensional array and accessed from there one at a time. How- 
ever, on a vector ccmputer, one apparently must transpose the arrays in order 
to use the same algorithm in both steps. As ah alternative to the data move- 
ment, two parallel approaches to solving the systems of tridiagonal equations 
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are described. It is then shown that when used together, they remove the neces- 
sity to rearrange the data. 


SIMULTANEOUS SOLUTION VECT0RI2ATI0N 

Since each of the tridiagonal systems is independent of the others, any 
algorithm for solving one of the tridiagonal systems becomes a parallel algo- 
rithm when it is used to perform the identical operations on the corresponding 
coefficients of each system. Hence, with M systems, each vector operation 
would be of length M. The most obvious algorithm to use is the serial algo- 
rithm in equations (1) to (3). This is the general approach used in reference 2 
and it is referred to here as (SS). The only difficulty is to determine what 
storage arrangement relative to the direction of implicitness is required. In 
reference 2, the coefficients for any one system are stored in one of the pro- 
cessors, and each processor solves the system whose coefficients are stored in 
its memory. 

Matrix equation (9) arises when the direction of implicitness is opposite 
the direction of storage. A particular scalar operation for column two, say 
Pl3/CJ8r must also be executed for columns one and three and will occur as 
Pi 2/^7 and Pi 4 /U 9 . Hence, the scalar operation for each system can be per- 
formed as a one-vector operation of length M. The conclusion is that (SS) 
works when the direction of implicitness is opposite the direction of storage 
(or numbering) of the grid. A quick glance at equation (7) shows that the same 
approach would not work when the direction of implicitness corresponds to the 
numbering of the grid. 

Another view of this method can be obtained by examining the structure 
of Q and R shown in figure 4. The matrix equation is of the form 


Al 

Bl 

0 

■u(l)' 


‘r H)' 

^2 

A2 

B2 

u(2) 

= 

r (2) 

0 

C 3 


u(3) 


r (3) 


where A^, B^, and Cj are diagonal matrices of size M, u^^) refers to 

the ith row of unknowns, and r(^) is the corresponding values of the right- 
hand side. If the diagonal of each diagonal matrix is stored as a vector and 
the obvious rules for multiplying, inverting, adding, and subtracting diagonal 
matrices are observed, it is clear that there is a l-to-1 correspondence with 
the operations in the scalar algorithm that is given by equations (1) to (3) 
except that each operation is now a vector operation of length M. This algo- 
rithm has been coded in STAR FORTRAN and timed. For M independent systems 
of equations, each of size N, 

Tgs(M,N) « 1700N + 7.5NM (10) 

Since N refers to the size of the system (the number of points in the direc- 
tion of implicitness), Tgs(MfN) is smaller when the direction of implicitness 
is along the side of the rectangle that has the smallest number of grid points. 
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LINKED PARALLEL VECTORIZATION 


Now consider another approach to solving the M independent tridiagonal 
systems, each of size N. When the direction of implicitness corresponds to 
the direction in which the grid points have been ordered, the storage is correct 
for using the parallel odd-even reduction algorithm. (See eq. (7).) However, 
since the size of each system is the number of columns of the grid (still assum- 
ing the grid is ordered by rows) , the algorithm would not appear to perform 
particularly well unless a very fine grid size is used. However, again it is 
instructive to view the overall structure as given in figure 3. If each of the 
M independent tridiagonal systems is denoted by Tju(i) = r(^), the resulting 
matrix equation Tx = r is given by 


Tl 


'u(l)' 


‘r (1)’ 

T2 

0 

u(2) 


r(2) 

• 

0 

« 

« 

« 

• 

u(M) 


• 

r (M) 


It is clear that the matrix T is also tridiagonal, of size p = NM, and can 
be solved with the odd-even reduction algorithm. The advantages of solving T 
as one larger system of size p are clear from figure 1. In addition, the fol- 
lowing theoron proves that it is not necessary to carry out all of the log 2 P 
steps, of the algorithm due to the special structure of T. 

Theorem: If the tridiagonal matrix T consists of M independent tridi- 

agonal systems of size N, the odd-even parallel algorithm can be terminated 
after log 2 N steps. 

Proof; For simplicity, assume N is a power of two. Denote the diagonals 
of T by c, a, and b, and their ith ccniponent by c^, ai, and b^j^. Note 
that for these uncoupled systems, 

cl+(k-l)N ~ ® (k = 2, 3, . . ., M) (11) 

and 

^N+(k— 1)N - 0 (k = 1 , 2 , . . , , M — 1 ) (12) 

At each step of the algorithm, half of the unknowns are eliminated, leaving a 
reduced tridiagonal system half as large as the original. Consider the first 
reduced system after one step of the algorithm. Call the diagonals of this 
syston c, a, and b. Then, as shown in reference 1 , 


9 



■C2jC2j-l 


j = 2, 3, 


c 


j 


a2j-l 


and 


-b2jb2j+i 

a2j+l 





Now, in particular, when j = 1 + (k - 1)N for k = 2, 3, . . ., M and 
N = N/2, then 

C2j-1 = C2+2(k-1)N-1 = ci + (k-l)N = 0 
Also, when j = N + (k - 1)N for k = 1, 2, . . ., M - 1, then 


t>2j - t>2N+2(k-l)N - bN+(k-l)N ~ ° 

Since Cj and bj are zero for those M - 1 values of j, the p x p system 
still has M - 1 zero superdiagonal and subdiagonal coefficients separating 
the independent systems. The same argument shows that after log 2 N steps, 
the reduced system, which is now M x m, still has M - 1 subdiagonal and super- 
diagonal zeroes. Hence, it is diagonal. 

The assumption that N was a power of two was made only to allow the . pre- 
cise identification of the M - 1 zero subdiagonal and superdiagonal elements 
in the reduced system. It is clear fran equations (11) and (12) that the zero 
off-diagonal elements must be propagated for any N. The elimination procedure 
can be stopped after log 2 N' steps where N' is the smallest power of 2 equal 
to or greater than N. Q.E.D. 

The approximate timing for the M systems. of size N is 

Tlp(M,N) = 5200 log2 N + 46MN (13) 


MIXED ADI ALGORITHM 


The discussion concerning the (SS) and (LP) algorithms indicates that (SS) 
is appropriate when the direction of implicitness is opposite the direction in 
which the grid is ordered and that (LP) is appropriate when the direction of 
implicitness is the same as the grid ordering'. This indicates that a mixed 
algorithm can be used which will remove the necessity for rearranging the data. 
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Assume that the grid is ordered by rows [columns] of the grid'. Then 

(1) When implicit in the x-direction, use the (LP) [(SS)1 algorithm 

(2) When implicit in the y-direction, use the (SS) [(LP)] algorithm 



Once the grid is ordered, the direction in which (SS) and (LP) are used 
is determined. When the grid is square, the storage selection makes no dif- 
ference. However, when the number of rows Ny does not equal the number of 
columns N^, the grid ordering should be selected to minimize the sum of the 
time for each algorithm. Let y = min (Njj,Ny) and 6 = max (Njj,Ny) . Then 
if the grid is ordered in the direction corresponding to the maximum dimension, 
the time T(ADI) for the mixed algorithm is 

T(ADI) = Tlp(M, 6) + Tss('S,y) 

= 5250 log 2 6 + 1700y + 53.56y (14) 

If the grid is ordered along the minimum dimension 
T(ADI) = TLp{6,y) + Tss(y,«S) 

= 5250 log 2 y + 17006 + 53.56y 

Since 6 ^ y 

5250 log2 5 + 1700y 

g ] 

5250 log 2 y + 17006 

for any y ^ 8. Thus, the grid should be ordered consecutively along the 
largest dimension to yield the time given in equation (14). 

The choice between the mixed algorithm and the more conventional algorithm 
(that is, (SS) in both directions with matrix transposes) is problem dependent. 
The advantage for the mixed algorithm occurs when the region is nonsquare and/or 
when more than one data array must be transposed. This can be seen fran table I 
which gives the comparison for several combinations of M and N by using 
equations (10) and (13). In that table 

Tgg(y,6) + Tgs(6,y) + 2kT.j(y,6) 

Rk = (15) 

TLp(y,6) + Tss(fi,y) 

is the ratio of the times for the conventional to the mixed algorithm where k 
is the number of data arrays which must be transposed at each half step and 
Tip(M,N) is the time to transpose an M x n matrix. 

The use of different algorithms in different directions also applies to 
the three-dimensional problem. Consider the ADI method applied to a cube such 
as shown in figure 5. Assuming a dimension of N, the corresponding algorithm 
would then be 

(1) Implicit in the z-direction: Use (LP) algorithm on one system of 

size 
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(2) Implicit in the x-direction: Use (SS) algorithm N times, once for 

each square grid in the x-z plane. Vector operations are of 
length N 

(3) Implicit in the y-direction: Use (SS) algorithm once on all tri- 

diagonal systems. Vector operations are of length 


AVERAGE VECTOR LENGTH IN ADI 

Even though the average vector length in the solution to the M triadiag- 
onal systems is only M with (SS) , the length for the entire algorithm may be 
considerably higher because the coefficients a^, Bj, Yi# Pi» ^if and 
in equation (8) may be all evaluated with vectors of length MN since the equa- 
tions are independent. In many cases, this computation may dominate the time 
so that much of the work involves long vectors. To estimate the average vector 
length I , assume that M = N and that 0 vector operations of length are 

performed to evaluate the coefficients; also assume that cpN vector operations 
of length N are required to solve the resulting systems, as in the (SS) algo- 
rithm. Then 


0n 2 + cpN(N) /cp + 0\ 

£ = «= N 

0 + cpN \ ' 


Counting a multiplication as one operation, an addition as one-half, and a divi- 
sion as two, tp = 7.5 so that 


I 


h.5 + o'' 


7.5 


N 


The value for 0 is at least 4.5 to evaluate Ru^ + b and, of course, can 
be considerably larger for nontrivial coefficient evaluation. 


The evaluation of the coefficients and the right-hand side with vectors 
which are 0 (MN) requires the boundary points to be included in the vector of 
unknowns. To have a valid equation for a point i on the boundary, set 


Pi = Ti = Oi = 6i = Yi = 0 Oi = 1 ri = Ui 

in, for instance, equation (8), where u^ is the fixed boundary value for the 
ith point. 


OTHER APPLICATIONS 
SLOR 

The two algorithms can, of course, be used in any application where there 
exist a number of independent tridiagonal systems to solve. It may be neces- 
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sary, however, to impose an ordering other than the usual row-by-row ordering 
on the grid to obtain the independence required for these algorithms. For 
instance, the usual row by row ordering of the grid in figure 2 yields a block 
tridiagonal matrix Ci,Ai,Bi where Cf and are diagonal of size N and 

each Aj is tridiagonal. When the successive line overrelaxation (SLOR) method 
is applied to this matrix, the following matrix equation must be solved at each 
step where Cj = and O) is the relaxation factor: 



Each u^i) is the updated solution on the ith row of the grid. Here the tridi- 
agonal equations are not independent and must be solved one at a time. However, 
as pointed out in reference 2, if the odd rows are improved first, followed by 
the even rows, the resulting task requires two solutions of N/2 independent 
systems of equations. Here the matrix equation is 
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where also Bj = ubj. The matrix in equation (16) is of the form Qv = w 
gi ven by 


Qi 

0 

’vO)’ 


w ) 

L2 

Q2 

v(2) 


w(2) 


which is solved by 


v(l) = Qi-lwd) 

v(.2) = Q 2 - 1 (w( 2 ) - L2v(1)) 

The matrix results from the ordering shown in figure 6(a) and has the structure 
shown in figure 7(a). With this ordering, (LP) is appropriate. As in the ADI 
method, each of the grid points involved in the N/2 independent systems could 
be numbered columnwise so that (SS) is appropriate. This ordering and the 
resulting Q matrix are shown in figure 6(b) and figure 7(b). 


Three-Dimensional Crank-Nicolson 

As a final example, consider a three-dimensional problem that uses a Crank- 
Nicolson differencing procedure. (See, for example, ref. 9.) In figure 8, the 
differencing scheme is shown for the point (i,j,k) looking down on the x-y plane 
at the level z = k. At the point (i,j,k), there is an implicit reference to 
the points immediately above and below at (i,j,k + 1) and (i,j,k - 1), respec- 
tively. There is also reference to the points marked by a cross, as well as 
some values directly above and below them in the z-direction. Now, assuming 
the solution is specified on the boundary, one could start at the point denoted 
by A by grouping all the equations from each plane at that point. A tridiag- 
onal system of equations results for finding the solution at that column of 
points in the z-direction. Since the boundary values are known, the system is 
completely specified with the given differencing. One could now systematically 
proceed to step 1 position in the y-direction, solve that system, step again, 
etc., until all unknowns corresponding to x = 1 are determined. With this 
approach, each tridiagonal system must be solved one at a time. It can be seen, 
however, that once the solution at x = 1 , y = 1 is known, the differencing 
allows the two systems denoted by B to be specified. They can be solved by 
using either the (SS) or (LP) algorithms. Then, as shown, the three systems 
denoted by C can be solved at the next step, etc. Ultimately, one has M 
systems of size N where N is the number of grid points in the z-direction. 

In the ADI m.ethod, both (SS) and (LP) were used due to the desire to remove 
the transpose requirement. In the latter two examples, either algorithm could 
be used. 
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CONCLUDING REMARKS 


Several weaknesses of a straightforward implementation of the ADI method 
on a vector computer have been pointed out. Two algorithms have been presented 
which solve M independent systems of tridiagonal equations of size N in a 
reasonably efficient manner in spite of the fact that no known tridiagonal sol- 
ver solves any one system particularly efficiently. It is shown that if the 
(SS) algorithm is used when the direction of implicitness is opposite the direc- 
tion of storage of the grid values and the (LP) algorithm is used at alternate 
steps, then the apparent necessity to rearrange storage at alternate steps is 
eliminated. 

The mixed algorithm is most likely to be an advantage when the grid is 
rectangular and/or when several transposes are required. This is because the 
greater speed of the (SS) algorithm is less, impor tant . in these instances. In 
the case of a rectangular grid, it is shown that the grid should be ordered 
consecutively along the largest dimension. 

An average vector length for the ADI method is derived which shows that, 
in spite of the shortness of the vector lengths used in the solution to the 
tridiagonal systems, the algorithm may still have a relatively large average 
vector length if the evaluation of the coefficients involves a lot of 
computations. 

Other algorithms which require the solution of numerous tridiagonal systems 
of equations have been discussed and orderings have been shown which lead to an 
uncoupling of the system. 


Langley Research Center 

National Aeronautics and Space Administration 
Hampton, VA 23665 
July 14, 1978 
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Figure 2.- Model grid. 
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Figure 3.- Structure of ADI matrices when implicit in x-direction. 
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Figure 4.- Structure of ADI matrices when implicit in y-direction. 
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